13 research outputs found
Fighting Authorship Linkability with Crowdsourcing
Massive amounts of contributed content -- including traditional literature,
blogs, music, videos, reviews and tweets -- are available on the Internet
today, with authors numbering in many millions. Textual information, such as
product or service reviews, is an important and increasingly popular type of
content that is being used as a foundation of many trendy community-based
reviewing sites, such as TripAdvisor and Yelp. Some recent results have shown
that, due partly to their specialized/topical nature, sets of reviews authored
by the same person are readily linkable based on simple stylometric features.
In practice, this means that individuals who author more than a few reviews
under different accounts (whether within one site or across multiple sites) can
be linked, which represents a significant loss of privacy.
In this paper, we start by showing that the problem is actually worse than
previously believed. We then explore ways to mitigate authorship linkability in
community-based reviewing. We first attempt to harness the global power of
crowdsourcing by engaging random strangers into the process of re-writing
reviews. As our empirical results (obtained from Amazon Mechanical Turk)
clearly demonstrate, crowdsourcing yields impressively sensible reviews that
reflect sufficiently different stylometric characteristics such that prior
stylometric linkability techniques become largely ineffective. We also consider
using machine translation to automatically re-write reviews. Contrary to what
was previously believed, our results show that translation decreases authorship
linkability as the number of intermediate languages grows. Finally, we explore
the combination of crowdsourcing and machine translation and report on the
results
Exploring linkability of user reviews
Large numbers of people all over the world read and contribute to various review sites. Many contributors are understandably concerned about privacy in general and, specifically, about linkability of their reviews (and accounts) across multiple review sites. In this paper, we study linkability of communitybased reviewing and try to answer the question: to what extent are ”anonymous ” reviews linkable, i.e., highly likely authored by the same contributor? Based on a very large set of reviews from one very popular site (Yelp), we show that a high percentage of ostensibly anonymous reviews can be accurately linked to their authors. This is despite the fact that we use very simple models and equally simple features set. Our study suggests that contributors reliably expose their identities in reviews. This has important implications for cross-referencing accounts between different review sites. Also, techniques used in our study could be adopted by review sites to give contributors feedback about linkability of their reviews.
Optimizing Bi-Directional Low-Latency Communication in Named Data Networking
Content-Centric Networking (CCN) is an alternative to today’s Internet IP-style packet-switched host-centric networking. One key feature of CCN is its focus on content distribution, which dominates current Internet traffic and which is not well-served by IP. Named Data Networking (NDN) is an instance of CCN; it is an on-going research effort aiming to design and develop a full-blown candidate future Internet architecture. Although NDN’s emphasizes content distribution, it must also support other types of traffic, such as conferencing (audio, video) as well as more historical applications, such as remote login and file transfer. However, suitability of NDN for applications that are not obviously or primarily content-centric. We believe that such applications are not going away any time soon. In this paper, we explore NDN in the context of a class of applications that involve lowlatency bi-directional (point-to-point) communication. Specifically, we propose a few architectural amendments to NDN that provide significantly better throughput and lower latency for this class of applications by reducing routing and forwarding costs. The proposed approach is validated via experiments. 1